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BACKGROUND OF THE INVENTION 
Field of the Invention 

This invention relates to an apparatus, method, and 
computer program product for information filtering, in a 
computer system receiving a data stream from a computer 
network. 

2 . Description of the Relevant Art 

Recent developments in computer networking, 
particularly witli regard to global computer internetworking, 
offer vast amountsNpf stored and dynamic information to 
interested users. Iri^eed, some estimate that hundreds of 
thousands of news articles stream through the global 
internetwork each day, anci that the total number of files 
transferred through the global internetwork (hereinafter 
"network") is in the millions \ As computer technology 
evolves, and as more users participate in this form of 
communication, the amount of information available on the 
network will be staggering. 



Although databases are relatively static and can be 
searched using conventional network search engines, current 
information filtering 'schemes are ill-suited to thoroughly 
search che massive, dynamic stream of new information 
passing through the network each day. 

Presently, the information is organized, if at all, to 
the extent thcLt only skilled, persistent, and lucky, 
researchers can ^ferret out meaningful information. 
Nevertheless, significant amounts of information may go 
unnoticed. For example, because most existing information 



filtering schemes focu 
information in other fo 



[on locating textual articles, 
is -- visual, audio, multimedia, and 
patterned data may be oVerlooked completely. From the 
perspective of some users, a\few items of meaningful 
"information" can be obscured jyy the volume of irrelevant 
data streaming through the network. Often, the information 
obtained is inconsistent over a community of like-minded 
researchers because of the nearly-irif inite individual 
differences in conceptualization and Vocabulary within the 
community. These inconsistencies exist\with both the 
content of the information and the manner\in which a search 
for the content is performed. Furthermore ,\ the credibility 
of the author, the accuracy, and quality of aygiven 
article's content, and thus the article's "usefulness, " 
often are questionable. 



\\ The problem of information overload can be more acute 
for\persons involved in multidisciplinary endeavors, e.g., 
medicYne, law, and marketing, who are charged with 
monitoring developments in diverse professional domains. 
There are\many reasons why users want to communicate with 
each other al)out specific things as they find networked 
resources. However, drawing attention to articles of common 
interest to a community of researchers, or workgroup, often 
requires a separaYe intervention, such as a telephone call, 
electronic mail, an& the like. 

Often, membership in a workgroup or community is 
sharply defined, and work^s in one physical community may 
be unaware of interest ings~ctevelopments in other workgroups 
or communities, whether^-crrNnot the communities are similar. 
This isolation may be at theNexpense of serendipitous 
discoveries that can arise f roni parallel developments in 
unrelated or marginally-related fields . 

Adding to the complexity of the information filtering 
problem is that an individual user ' s^ interests may shift 
over time, as may those of a communitV and many existing 
information filtering schemes are unabl^ to accept shifts in 
the individual's interest, the community's interest, or 
both. Furthermore, information flow usualW is uni- 
directional to the user, and little characterization of 
secondary user, or group, interests, e.g., the^ consumer 
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preferences of users primarily interested in molecular 
biolog\ or oenology, is derived and used to provide targeted 
marketings to those users/consumers, and to follow changing 
demographic^ trends . 

Typically, identifying new information is effected by 
monitoring all Particles in a data stream, selecting those 
articles having aV specific topic, and searching through all 
of the selected articles, perhaps thousands, each day. One 
example is where users interact with a web browser to 
retrieve documents f roirk various document servers on the 
network. Given the incr^sinh/impracticality of this brute- 
force approach, the heterogenous nature of "information" on 
the global internetwork, andXfctle growing complexity of 
social interactions that are evolving concurrently with 
networking technology, there havfe been several attempts to 
address some of the foregoing problems by using adaptive 
information filtering systems. \ 

In one approach, the information ^filtering is geared 
toward content-based filtering. Here, fthe information 
filtering system examines the user's patterns of keywords, 
and semantic and contextual information, to\map information 
to a user's interests. This approach does not provide a 
mechanism for collaborative activities within k group. 

Another approach uses intelligent software Vgents to 
learn a user's behavior, i.e., "watching over the ^shoulder, " 



regarding certain types of textual information, for example, 
electronic mail messages. In this scheme, the agents offer 
to take action, e.g., delete the message, forward it, etc., 
on the basis of the user's prior responses to the content of 
that certain information. Also, this scheme provides a 
minor degree\of inter-agent collaboration by allowing one 
agent to draw \ipon the experience of other agents, typically 
for the purpose \>f initialization. However, each agent is 
constrained to devtelop its expertise in a particular domain 
within the limited rknge o^the type of information. Also, 
the passive feedback n^ure/of the "over-the-shoulder " 
approach can place an unactcep table burden on the system's 
learner, reducing inforina^on throughput and the decreasing 
the efficiency and usefulness of the overall system. Also, 
systematic errors can be introduced into the passive 
feedback error, and the actual response of the user may be 
misinterpreted. \ 

Another approach uses content-bksed filtering to select 
documents for a user to read, and supports inter-user 
collaboration by permitting the users in\^ defined group to 
annotate the selected documents. Annotations tend to take 
as many forms as there are users, placing thet emphasis on 
characterizing, maintaining, and manipulating X group of 
diverse annotations, or "meta-documents , " from different 
users in conjunction with the original document. \ 



Collaboration is achieved by enabling the filters of other 
usees to access the annotations. While this approach is 
usefiA to the extent that other users can receive a deeper 
understanding of the comments and criticism provided by a 
particular \user , the costs include the additional computer 
effort required to implement such collaboration over large, 
diverse groups and, more importantly, the extra time 
required for each \ser to review the comments and criticism 
of the annotations of\ tiie— others . Also, annotation sharing 



and filtering are hamper 



by the variety in vocabulary and 



conceptualization among i^s^rs, 

Yet another apprOcrCti^emtoloys collaborative filters to 
help users make choices based on the opinions of other 
users . The method employs ratingVservers to gather and 
disseminate ratings. A rating server predicts a score, or 
rating, based on the heuristic that people who agreed in the 
past will probably agree again. This system is typically 
limited to the homogenous stream of text-based news 
articles, does little content-filtering, and \an not 
accommodate heterogenous information. 

Other projects have explored individual featiires such 
as market- trading optimization techniques for prioritizing 
incoming messages; rule-based agents for recognize user's 
usage patterns and suggest new filtering patterns to the\ 
user; and personal-adaptive recommendation systems using 



^it-questions for rating documents and creating shared 
recommendations; and the like. In each case, the 
collaborative and content-based aspects of information 
filtering are not integrated, and the filters are not 
equipped\to deal with heterogenous data streams. 

Many information filtering systems use a weighted 
average technique for user information feedback that, for 
example, extracts all of the ratings for an article and 
takes a simple we\ghted average over all of the ratings to 
predict whether an Sgrtjxrle^s relevant to a particular user. 



Simple weighted aver 
information content 
relatively sophistic 



cont 



ated 



inedi 



xroach 



however, tends to destroy the 
in the ratings, unless a 

is used for the functions 
generating the simple weighted averages. Little impact is 
given to factors such as credibility, personal preferences, 
and the like, which factors tend \o be irreversibly blurred 
during the averaging process. Simple weighted averages, 
then, can be lacking when one desires \o develop information 
filters that are well-fitted to a particular community and 
the specific interests of a user unless innovative methods 
are employed to preserve at least some of the Relevant 
information . 

What is need then is an apparatus and method ft^r 
information filtering in a computer system receiving ^ data 
stream from a computer network in which entities of 
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information relevant to the user, or " inf ormons , " are 
extracted from the data stream using content-based and 
collaborative filtering. Such a system would employ an 
adaptiN/e content filter and an adaptive collaborative filter 
which ar\e integrated to the extent that an individual user 
can be a xanique member client of multiple communities with 
each community including multiple member clients sharing 
similar interests. 

The system^ also would implement adaptive credibility 
filtering, providing member clients with a measure of 
informon credibility, as judged by other member clients in 
the community. The system also may implement recommendation 
filtering and consultation) filtering. Furthermore , the 
system would be preferred to be self-optimizing in that the 
adaptive filters used in the system would seek optimal 
values for the function intended by the filter, e.g., 
collaboration, content analysisV credibility, etc. 



20 



25 



3. Citation of Relevant Publications 

In the context of the foregoing description of the 

relevant art, and of the description of \he present 

invention which follows, the following publications can be 

considered to be relevant: 

Susan Dumais, et al . Using Latent Semantj>c Analysis to 
Improve Access to Textual Information . In Proceedings 
of CHI-88 Conference on Human Factors in ComfriJting 
Systems. (19 88, New York: ACM) 



# 



D&vid Evans et al . A Summary of the CLARIT Project 
Technical Report, Laboratory for Computational 
Linguistics, Carnegie-Mellon University, September 
199: 

G. Fischer and C. Stevens. Information Access in 
Complex^ Poorly Structured Information Spaces . In 
Proceedings of CHI- 91 Conference on Human Factors in 
Computing^ Systems. (1991: ACM) 

D. Goldberg, et al . Using Collaborative Filtering to 
Weave an Information Tapestry . Communications of the 
ACM, 35, 12 \1992), pp. 61-70. 



Simon Haykin. 
Englewood CliJ 



^Adaptive Filter Theory. Prentice-Hall, 
NJ (1986) , pp. 100-380. 



Simon Haykin. Neural Networks: A Comprehensive 
Foundation. Macmr^lan College Publishing Co., New York 
(1994) , pp. 18-589 



Yezdi Lashkari, et 
In Conference of /6he 
Artificial Intelligence 



Collaborative Interface Agents 



Paul Resnick, et al . Gr 



of ACM 1994 Conference 
Cooperative Work. 



lean Association for 
Seattle, WA, August 1994. 

3UPLjbns : An Open Architecture 



for Collaborative Filtering/of Netnews . In Proceeding 



omputer Supported 
pp. 175-186. 



Anil Rewari, et al . Al Research and Applications In 
Digital's Service Organization \ Al Magazine: 68-69 
(1992) . 

J. Rissanen. Modelling by Shortest Data Description , 
Autojnatica, 14:465-471 (1978) . 



Gerard Salton. Developments in Automatic Text 
Retrieval . Science, 253:974-980, August 1991. 

C . E . Shannon . A Mathematical Theory Communication . 
Bell Sys. Tech. Journal, 27:379-423 (194^). 

Beerud Sheth. A Learning Approach to Personalized 
Information Filtering , Master's Thesis, Massachusetts 
Institute of Technology, February, 1994 



F. Mosteller, et al. Applied Bayesian and Classical 
Inference: The Case of the Federalist Papers A 
Springer-Verlag, New York (1984), pp. 65-66. \ 



# # 



10 




L Yan et al. Index Structures for Selective 
Dissemination of Information . Technical Report STAN- 
CS-9^l454 # Stanford University (1992). 




Yiming"Y&ng . An Example-Based Mapping Method for Text 
Categorization and Retrieval . ACM Transactions on 
In forma t i onN§ys t ems . Vol. 12, No. 3, July 1994, pp. 
252-277 . 




V SUMMARY OF THE INVENTION 

-fche invention herein provides a method for information 
filtering in a computer system receiving a data stream from 
a computer\network. Embedded in the data stream are raw 
informons, w\th at least one of the raw informons being of 
interest to th^vuser. The user is a member client of a 
community. The method includes the steps of providing a 
dynamic informon characterization having a plurality of 
profiles encoded therein, the plurality of profiles 
including an adaptive co^l^e^^prof ile and an adaptive 

adai 



collaboration profile; 
informons responsive to th< 



;ely filtering the raw 
ic informon 

characterization, producing a proposed informon thereby; 
presenting the proposed informonN^o the user; receiving a 
feedback profile from the user, responsive to the proposed 
informon; adapting at least one of th<k adaptive content 
profile and the adaptive collaboration profile responsive to 
the feedback profile; and updating the dynamic informon 
characterization responsive to the previous step of 
adapting. The method is an interactive, distributed, 



adaptive filtering method which includes community filtering 
and client filtering. This filtering respectively produces a 
community profile and a member client profile. Each of the 
community filtering and client filtering can be responsive 
to the acbaptive content profile and the adaptive 
collaboration profile. Furthermore, the dynamic informon 
characterization is adapted in response to the community 
profile, the member client profile, or both. The dynamic 
informon characterization includes a prefiltering profile, 
an adaptive broker filtering profile, and a member client 
profile. Also, adapti^ly ^ti^tering includes the steps of 



prefiltering the data 



according to the prefiltering 



profile, thereby extractinp\a plurality of raw informons 
from the data stream, the jbr§ filtering profile being 
responsive to the adaptive cont^it profile; filtering the 
raw informons according to the adaptive broker profile, the 
adaptive broker profile including tft^ adaptive collaborative 
profile and the adaptive content profiSLe; and client user 
filtering the raw informons according tcXan adaptive member 
client profile, thereby extracting the proposed informon. 

Another embodiment of the method providers the steps of 
partitioning each user into a plurality of member clients, 
each member client having a unique member client profile, 
each profile having a plurality of client attribute^; 
grouping member clients to form a plurality of communities, 
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eafch community including selected clients of the plurality 
of mtember clients, selected client attributes of the 
selected clients being comparable to others of the selected 
clients thereby providing each community with a community 
profile having common client attributes; predicting at least 
one community^ profile for each community using first 
prediction criteria; predicting at least one member client 
profile for the client in a community using second 
prediction criteria\ extracting the raw informons from the 
data stream, each of ^ie v r)^w informons having an informon 



content; selecting pfjropc 
informons, the proposed i 
least one of the common ,c 
client attributes; 



informons from the raw 

ons being correlated with at 
ii^nt attributes and the member 
prov^tling tl^e proposed informons to the 



user; receiving user feedback in\response to the proposed 
informons; and updating at least one of the first and second 
prediction criteria responsive to thV user feedback. The 
method also can include the step of pr^filtering the data 
stream using the predicted community proSlle, with the 
predicted community profile identifying th^ raw informons in 
the data stream. 

In addition, the step of selecting can iriclude 
filtering the raw informons using an adaptive content filter 
responsive to the informon content; filtering theXraw 
informons using an adaptive collaboration filter responsive 
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ro the common client attributes for the respective 
community; and filtering the raw informons using an adaptive 
member client filter responsive to the unique member client 
profiled 

The r^ethod also can include one or more of the steps of 
credibilityNf iltering, recommendation filtering, and 
consultation faltering the raw informon responsive to the 
feedback profile\and providing a respective adaptive 
recommendation profile and adaptive consultation profile. 
The step of prefiltering includes the step of creating a 
plurality of mode- inv^i^jprt^c one ept components for each of 
the raw informons; arp tl^B ^tep of filtering the raw 
informons includes the st^s— of (1) concept-based indexing 
of each of the mode- invar Lantiyconcepts into a collection of 
indexed informons; and (2jK^ekt:ing the community profile 
from the collection of indexed informons. 

One embodiment of the presentxinvention provides an 
information filtering apparatus in a ^computer system 
receiving a data stream from a computers network, the data 
stream having raw informons embedded thervein. The apparatus 
includes an extraction means for identifying and extracting 
the raw informons from the data stream, eachV)f the 
informons having informon content, at least onet of the raw 
informons being of interest to a user having a u\er profile, 
the user being a member of a network community having a 



dohununity profile, at least a portion of each of the user 
profile and the community profile creating an adaptive 
collaboration profile, the extracting means being coupled to 
the cortouter network. The apparatus also includes filter 
means f o\ adaptively filtering the raw inf ormons responsive 
to the adaptive collaboration profile and an adaptive 
content profile and producing a proposed informon thereby, 
the informon content being filtered according to the 
adaptive contentxprof ile, the filter means being coupled 
with the extractiori means. Additionally, the apparatus 
includes coimunicatiVi means for conveying the proposed 
informon to the user a^d-^eeeiving a feedback response 
therefrom, with thf feed&apJt response corresponding to a 
feedback profile, the cOT^ni cation means being coupled with 
the filter means. \S 

Profile adaptation i^^cctomplished by a first 
adaptation means for adapting atVleast one of the 
collaboration profile and the content profile responsive to 
the feedback profile, the first adaptation means being 
coupled to the filter means. The first\ adaptation means 
includes a prediction means for predicting a response of the 
user to a proposed informon, the prediction means receiving 
a plurality of temporally- spaced feedback profiles and 
predicting at least a portion of a future one\of the 
adaptive collaboration profile and the adaptivA content 
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profile in response thereto. Also included are computer 
suorage means for storing the adaptive collaborative profile 
and\the adaptive content profile, the storage means being 
coupl\d to the filter means. 

The apparatus also includes second adaptation 
means for\adapting at least one of the user profile 
responsive \o at least one of the community profile and the 
adaptive content profile, and the community profile 



responsive to a 
content profile, 



st one of the user profile and the 
d/the content profile responsive to at 
least one of the user profile and the community profile. It 
is preferred that the Prediction means is a self -optimizing 
prediction means using aVpreselected learning technique, and 
that learning technique includes at least one of a top-key- 
word-selection learning technique, a nearest-neighbor 
learning technique, a term- weighting learning technique, a 
probabilistic learning technique ,\and a neural network 
learning technique. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is an diagrammatic representation of an 
embodiment of an information filtering apparatus according 
to the present invention. 



Figure 2 is an diagrammatic "representation of another 
embodiment of an information filtering apparatus according 
to the present invention. 

Figure 3 is a flow diagram for an embodiment of an 
information filtering method according to the present 
invention . 

Figure 4 is a flow diagram for another embodiment of an 
information filtering method according to the present 
invention. 

Figure 5 is a flow diagram for yet another embodiment 
of an information filtering method according to the present 
invention . 

Figure 6 is an illustration of a three-component- input 
model and profile with associated predictors. 

Figure 7 is an illustration of a mindpool hierarchy. 

DETAILED DESCRIPTION OF THE EMBODIMENTS 

15 prefer ^Migurccj H/i#J 
The invention herein . provider an apparatus and method 

/ 

for information filtering in a computer system receiving a 
data stream from a computer network, in which entities of 
information relevant to the user, or " inf ormons , M are 
extracted from the data stream using content-based and 
collaborative filtering. Th ^Lnveiitioa is both interactive 
and distributed in structure and method. It is interactive 
in that communication is substantially bi-directional at 



# # 

filler 

each level of the^ inv e nt i o n . It is distributed in that all 
or part of the information filter can- include a purely- 
hierarchical (up-and-down/parent-child) structure or method, 
a purely parallel (peer-to-peer) structure or method, or a 
combination of hierachical and parallel structures and 
method. The invention also pr o vider - a computer pr o gram * 
product that implcmonto oclected 1 embodiments of the 
-app aratus and me thod. 

As used herein, the term "informon" comprehends an 
information entity of potential or actual interest to a 
particular user. In general, informons can be heterogenous 
in nature and can be all or part of a textual, a visual, or 
an audio entity. Also, informons can be composed of a 
combination of the aforementioned entities, thereby being a 
multimedia entity. Furthermore, an informon can be an 
entity of patterned data, such as, for cxampl - e , a data file 
containing a digital representation of signals and can be a 
combination of any of the previously-mentioned entities. 
Although some of the data in a data stream, including 
informons, may be included in an informon, not all data is 
relevant to a user, and is not within the definition of an 
informon. By analogy, an informon may be considered to be a 
"signal," and the total data stream may be considered to be 
"signal + noise." Therefore, an information filtering 
apparatus is analogous to other types of signal filters in 
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that it is designed to separate the "signal" from the 
"noise . " 

Also as used herein, the term "user" is an individual 
in communication with the network. Because an individual 
user can be interested in multiple categories of 
information, the user can be considered to be multiple 
clients each having a unique profile, or set of attributes. 
Each member client profile, then, is representative of a 
particular group of user preferences. Collectively, the 
member client profiles associated with each user is the user 
profile. The present invention can apply the learned 
knowledge of one of a user's member clients to others of the 
user's member clients, so that the importance of the learned 
knowledge, e.g., the user's preference for a particular 
author in one interest area as represented by the member 
client, can increase the importance of that particular 
factor, A's authorship, for others of the user's member 
clients. Each of the clients of one user can be associated 
with the individual clients of other users insofar as the 
profiles of the respective clients have similar attributes. 
A "community" is a group of clients, called member clients, 
that have similar member client profiles, i.e., that share a 
subset of attributes or interests. In general, the subset 
of shared attributes forms the community profile for a given 
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community and is representative of the community norms, or 
common client attributes. 

The "relevance" of a particular informon broadly 
describes how well it satisfies the user's information need. 
The more relevant an informon is to a user, the higher the 
"signal" content. The less relevant the informon, the 
higher the "noise" content. Clearly, the notion of what is 
relevant to a particular user can vary over time and with 
context, and the user can find the relevance of a particular 
informon limited to only a few of the user's potentially 
vast interest areas. Because a user's interests typically 
change slowly, relative to the data stream, it is preferred 
to use adaptive procedures to track the user's current 
interests and follow them over time. Provision, too, is 
preferred to be made for sudden changes in interest, e.g., 
taking up antiquarian sword collecting and discontinuing 
stamp collecting, so that the method and apparatus track the 
evolution of "relevance" to a user and the communities of 
which the user is a member. In general, information, 
filtering is the process of selecting the information that a 
users wishes to see, i.e., informons, from a large amount of 
data. Content-based filtering is a process of filtering by 
extracting features from the informon, e.g., the text of a 
document, to determine the informon' s relevance. 
Collaborative filtering, on the other hand, is the process 
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of filtering informons, e.g., documents, by determining what 
informons other users with similar interests or needs found 

to be rele^ ^ M&r J^Uvi^^ ^ 

The invention employs adaptive eon t ent. filters and 
A /I 

adaptive collaborative filters, which respectively include, 
and respond to, an adaptive content profile and an adaptive 
collaboration prof ile^/^The adaptive filters each are 
preferred to include at least a portion of a community 
filter for each community serviced by the apparatus, and a 
portion of a member client filter for each member client of 
the serviced communities. For this reason, the adaptive 
filtering is distributed in that each of the community 
filters perform adaptive collaborative filtering and 
adaptive content filtering, even if on different levels, and 
even if many filters exist on a given level. The integrated 
filtering permits an individual user to be a unique member 
client of multiple communities, with each community 
including multiple member clients sharing similar interests. 
The adaptive features permit the interests of member clients 
and entire communities to change gradually over time. Also 
a member client has the ability to indicate a sudden change 
in preference, e.g., the member client remains a collector 
but is no ionger interested in coin collecting. 

Th^-invontioEt also implements adaptive credibility 
filtering, providing member clients with a measure of 
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informon credibility,- as judged by other member clients in 
the community. For example, a new member client in a first 
community, having no credibility, can inject an informon 
into the data flow, thereby providing other member clients 
in other communities - can be provided with the proposed 
informon, based on the respective community profile and 



member client - prof ilc . If the other member clients believe 
the content of the informon to be credible, the adaptive 
credibility profile will reflect a growing credibility. 
Conversely, feedback profiles from informon recipients that 
indicate a lack of credibility cause the adaptive 
credibility profile, for the informon author, to reflect 
untrustworthiness . However, the growth and declination of 
credibility are not "purely democratic, " in the sense that 
one's credibility is susceptible to the bias of others' 
perceptions, so the growth or declination of one's 
credibility is generally proportional to how the credibility 



of the. mcmclient is vie w by other member clients. 
A 

Member clients can put their respective reputations "on 
the line, " and engage in spirited discussions which can be 
refereed by other interested member clients. The 
credibility profile further can be partitioned to permit 
separate credibility sub-profiles for the credibility of the 
content of the informon, the author, the author's community, 
the reviewers, and the like, and can be fed back to 
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discussion participants, reviewers, and observers to monitor 
the responses of others to the debate. The adaptive 
credibility profiles for those member clients with top 
credibility ratings in their communities may be used to 

5 establish those member clients as "experts" in their 

respective communities. 

With this functionality, additional features can be 
implemented, including, for example, "instant polling" on a 

% matter of political or consumer interest. In conjunction 

H i - 

with both content and collaborative filtering, credibility 
f; filtering, and the resulting adaptive credibility profiles, 

also may be used to produce other features, such as on-line 

«h 

5 consultation and recommendation services. Although the 

M, 

fy; "experts" in the communities most closely related to the 

Igy, topic can be afforded special status as such, member clients 

gj: from other communities also can participate in the 

consultation or recommendation process. 

In one embodiment of the consultation service, 
credibility filtering can be augmented to include 
20 consultation filtering. With this feature, a member client 

can transmit an informon to the network with a request for 
guidance on an issue, for example, caring for a sick 
tropical fish. Other member clients can respond to the 
requester with informons related to the topic, e.g., 
25 suggestions for water temperature and antibiotics. The 
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informons of the responders can include their respective 
credibility profiles, community membership, and professional 
or avocational affiliations. The requester can provide 
feedback to each of the responders, including a rating of 
the credibility of the responder on the particular topic. 
Additionally, the responders can accrue quality points, 
value tokens, or "info bucks," as apportioned by the 
requester, in return for useful guidance. 

Similarly, one embodiment of an on-line recommendation 
service uses recommendation filtering and adaptive 
recommendation profiles to give member clients "fora for 
- obtaining recommendations on matters as diverse as local 
auto mechanics and world-class medieval armor ref urbishers . 
In this embodiment, the requester can transmit the informon 
to the network bearing the request for recommendation. 
Other member clients can respond to the requester with 
informons having specific recommendations or dis- 
recommendations, advice, etc. As with the consultation 
service, the informons of the responders can be augmented to 
include their respective credibility profiles, community 
membership, and professional or avocational affiliations. A 
rating of each recommendation provided by a responder, 
relative to other responders' recommendations, also can be 
supplied. The requester can provide feedback to each of the 
responders, including a rating of the credibility of the 
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responder on the particular topic, or the quality of the 

recommendation. As before, the responders can accrue 

quality points, value tokens, or "info bucks," as 

apportioned by the requester, in return for the useful 

recommendation . 

Furthermore, certain embodiments «e€ — the invention are 

^ preferred to be self -optimizing in that -fefee some or all of 

the adaptive filters used in the system dynamically seek 

optimal values for the function intended by the filter, 

e.g., content analysis, collaboration, credibility, 

reliability, etc. 

The ^invention herein is capable of identifying, -&ftdr 

^ -feg -ct u kiiiy shifts —is, the preferences of individual member 

clients and communities, providing direct and inferential 

and -WJci^k 5Wrfb in 4V pre-fene*** 

^^L consumer preference informat ion, ^whether che shifts be 

The . ^ 

£ gradual or sudden. ^Thio - consumer preference information can 

be used to target particular consumer preference groups, or 
cohorts, and provide members of the cohort with targeted 
informons relevant to their consumer preferences. This 
information also may be used to follow demographical shifts 
so that activities relying on accurate demographical data, 
such as retail marketing, can use the consumer preference 
information to anticipate evolving consumer needs in a 
timely manner. 
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25 

To provide a basis for adaptation, it is preferred that 
each raw informon be processed into a standardized vector, 
which may be on the order of 20,000 to 100,000 tokens long. 
The learning and optimization methods that ultimately are 
chosen are preferred to be substantially robust to the 
problems which can be presented by such high-dimensional 
input spaces. Dimensionality reduction using methods such 
as the singular value decomposition (SVD) , or auto-encoding 
neural networks attempt to reduce the size of the space 
while initially retaining the information contained in the 
original representation. However, the SVD can lose 
information during the transformation and may give inferior 
results. Two adaptation/ learning methods that are presently 
preferred include the TF-IDF technique and the MDL 
technique , 

-IDF is a weighting scheme that gives emphasis to the 
weighting parameters for more important terms in an 
informon. ^F represents "term frequency," or the number of 
times a particb^ax} term occurs in a given informon. This is 
but one factor u^J^L in developing the weighting. IDF 
represents " inverse-dbcument-f requency , " which is a measure 
of how often a particulaK term appears across in a group of 
informons. Typically, common words have a low IDF, and 
unique terms will have a high spF. 



# 
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The TF-IDF weighting technique employs two empirical 
observations regarding text. First, the more times a token 
t appears in a document d (called the term frequency, or 
tf t d ) , tl^e more likely it is that t is relevant to the topic 
of d. Second, the more times t occurs throughout all 



documents (allied the document frequency or df t ) , the more 
poorly t discriminates between documents . For a given 
document, these\two terms can be combined into weights by 
multiplying the t\f by the inverse of the df (i.e., idf) for 
each token. OftenX the logarithm of tf or idf is taken in 
order to de-emphasize^ the increases in weight for larger 
values . 



One weight used f 




en t in document d is ; 



w(t,d) ' =-tf£ d log(| N | /df t ) 



where N is the entire set of qocuments . The way in which 
TF-IDF vectors are compared also, takes advantage of the 
domain. Because documents usually, contain only a small 



\ 



fraction of the total vocabulary, tl^ significance of a word 
appearing is much greater than of it ri<pt appearing. To 
emphasize the stronger information content in a word 
appearing, the cosine of the angle between, vectors is used 
to measure the similarity between them. They effect of this 
cosine similarity metric can be better understood by the 
following example. Suppose two documents each contain a 
single word, but the words are different. The similarity of 
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,the documents then would be zero, because the cosine of the 

lgle between two perpendicular vectors is zero. A more 
unbiased learning technique that did not take advantage of 
this ^domain feature usually would group the two documents as 
being very similar because all but two of the elements in 
the lengthy vectors agreed (i.e. they were zero). 

Using IT- IDF and the cosine similarity metric, there 
are many waysVo then classify documents into categories, as 
recognized by aXskilled artisan. For example, any of the 
family of nearest^neighbor techniques could be used. In the 
present invention, tfc^e informons in each category can be 
converted into TF- IDF (vectors, normalized to unit length, 
and then averaged to get p^p^ to type vector for the 
category. The advantagestp/taking this approach include an 
increased speed of computation\and a more compact 
representation. To classify a new document, the document 
can be compared with each prototypeNyector and given a 
predicted rating based on the cosine similarities to each 
category rating. In this step, the results can be converted 
from a categorization procedure to a continuous value, using 
a linear regression. 

Probabilistic techniques consider the probability that 
a particular term, or concept, that occurs in an ri^formon, 
or that the informon satisfies the user's information, need. 
Minimum description length, or MDL, is a probabilistic 
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:echnique that attempts to minimize the description length 
orVan entire data set. The MDL principle can be applied to 
measure the overall "quality" and "cost" of a predicted data 
set, or model, and to optimize both quality and cost, 
striking*, a balance between the quality of the prediction and 

\ 

the complexity cost for achieving that quality. 



The Minimum Description Length (MDL) Principle provides 



an inf ormation\theoretic framework for balancing the 
tradeoff between\model complexity and training error. In 



the present invention's domain, this tradeoff involves how 
to weight each token ^s importance and how to decide which 




tokens should be left outf-Qf the model for not having enough 
discriminatory power. Thjg^MDL principle is based Bayes' 
Rule : 

Generally, it is desirable to fiira hypothesis H that 

\ 

maximizes p(H|D), i.e. the probability of H given the 

\ 

observed data D. By Bayes' Rule, thrs is equivalent to 



maximizing p (D | H) p (H) /p (D) , because p(D^is essentially 
independent of H, p(D|H)p(H) can be maximized; or, 
equivalently , 

-log (p(D\H)) -log (p(H)) 
can be maximized from information theory principles, 
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-log 2 (p(X)) is equal to the size in bits of encoding event X 

an optimal binary code. Therefore, the MDL 
interpretation of the above expression is that, to find the 
most\probable hypothesis given the data, the hypothesis 
which minimizes the total encoding length should be found. 
This encoding length is equal to the number of bits required 
to encode the hypothesis, plus the bits required to encode 
the data giveri the hypothesis. Given a document D'with 
token vector T d \containing l d non-zero unique tokens in the 
informon) and traiking data D train , the most probable category 
Ci for d is that which minimizes the bits needed to encode 
T d plus Ci: 



arg max [ 



'i\T dt l df D 



cram 



= arg min [-log (p(T d \c it l^D^^) ) -log (p(c i |i d , 

The data independence assumption isNthat the probability of 
the data in an informon or document, given its length and 
category, is the product of the individual token 
probabilities, is 



20 



p(T d \c it i dt n \ iain ) =JJ p(t ifd \c it i df D cra } h ) 

where t id is a binary value indicating whether or \ot the 
token i occurred at least once in document d. 

Generally, one way to derive a probability estimate for 
t t d while avoiding a computationally expensive optimization 
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^tep for the model parameters is to compute the following 
additional statistics from the training data, and use them 
as tnfe parameters in the model : 



' 1 . J 



Where t A is the number of documents containing token i, and 



r i,i 



Where r i#1 is a correlation estimate [0-1] between t id and 



Each statistic can bje computed for each concept, and 
for the total across all conc^pt^fe . The objective is to 
establish a general "backgroiMd">siistribution for each 
token, and a category- spealf ife^lstribut ion. If the token 
distribution is a simple binomial ,\ independent of document 
length 

PUi.d = 0\[c k ]) = 1 - ^[^c^l/l^^l 

However, if the token probability is dependent on document 
length, the following approximation is valid. 



P(t ifd = 0\l d [ f c k ]) = (1 - t.LcJ/ £ I.) 1 ' 



The above two distributions can then be combined in a 
mixture model by weighting them with t ijd to provide: 



p(t i<d = o\l d [,c k ]) = (i-t iICjtl /^ Cjtl ) J -^*Jf (i-t i[CJ / £ ) J *"' 2 



By hypothesizing that each token either truly has a 
specialized distribution for a category, or that the token 
is unrelated\to that category and just exhibits random 
background fluctuations, the MDL criteria for making the 
decision between tdiese hypotheses is to choose the category- 
specific hypothesis\if the total bits saved in using this 
hypothesis, or total bits = 



Total bits = ]T -logfpC^ 

cfeNc* 




))-[-log(p(t ifd |J d , c k ))] 



is greater than the complexity cost\pf including the extra 
category-specific parameters. 



An additional pragmatic advantage to this probabilistic 
model choice is that when the logs are taken>of the 
probabilities to get costs in bits, the probability 
calculation for each article's words becomes a simple, 
linear one that can be computed in 0(l d ) , rather than the 

dlity tc\ 



longer 0( | dictionary | ) . This is due to the abili 



precompute the sum of the bits required to encode no 



:> words 



\ 



4 
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^occurring. From this sum the bits required for an actual 
document can quickly be computed. 

One method for learning at least one of the TF-IDF and 
)L approaches can employ the following steps: 
Di\ide the articles into training and unseen test sets. 
Pars\ the training articles, throwing out tokens 
occurring less than a preselected threshold. 
For TF-IQF, also throw out the F most frequent tokens 
over the entire training set. 
l{£L 4. Compute t A arfd r i(1 for each token. 

5. For TF-IDF, compute the term weights, normalize the 



ST. 




weight vector fo.pT'ejfeh informon A, and find the average 
of the vectors for^ektsh rating category M. 

6. For MDL, decide ^4^%ach token t and category c whether 
Igjv to use p(t/l,c) =p(t/l)\ or use a community dependent 

j|. model for when t occurs fn c. Then pre-compute the 

encoding lengths for no tolcens occurring for informons 
in each community. 

7. For TF-IDF, compute the similarity of each training 
20 informon to each rating category prototype using, for 

example, the cosine similarity metri<^ 

8. For MDL, compute the similarity of each training 
informon to each rating category by takfng the inverse 
of the number of bits needed to encode T d \nder the 

25 . community's probabilistic model 
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>ing the similarity measurements computed in steps 7 
or\b^ou. the training data, compute a linear regression 
from^&ting community similarities to continuous rating 
predictions . 

10. Apply the\aodel obtained in steps 7-9 similarly to test 
inf ormons , 

Figure 1 illustrates one embodiment of an information 

$fa*k*td __________ A 

filtering apparatus l^ ^eeo^ding to Lhe ii iV/rixlL'l OH h ^re^r rrZAln 

general, a data stream is conveyed through network 3, which 

can be, a global internetwork. A skilled artisan would 

-E*etwgHi©e* that apparatus 1 can be used with other types of 
A 

networks, including, for example, an enterprise-wide 
network, or "intranet." . Using network 3, User #1 (5) can 
communicate with other users, for example, User #2 (7) and 
User #3 (9) , and also with distributed network resources 
such as resource #1 (11) and resource #2 (13). 

Apparatus 1 is preferred to be part of computer system 
16, although User #1 (5) is not required to be the sole user 
. of computer system 16. In one present embodiment, it is 
preferred that computer system 16 having information filter 
apparatus 1 therein filters information for a plurality of 
users. One application for apparatus 1, for example, could 
be that user 5 and similar users may be subscribers to a 
commercial information filtering service, which can be 
provided by the owner of computer system 16. 
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Extraction means 17 can be coupled with, and receives 
data stream 15 from, network 3 . Extraction means 17 can 
identify and extract raw informons 19 from data stream 15. 
Each of the raw informons 19 -liavti an iiiloillton content. 

A* A 

Extraction means 17 uses -fefee- adaptive content filter, and at 

A 

least part of the adaptive content profile, to analyze the 

row 

data stream for the presence of A informons . Raw informons 

A 

are those data entities whose content identifies them as 
being "in the ballpark," or of potential interest to a 
community coupled to apparatus 1. Extraction means 17 can 
remove duplicate informons, even if the informons arrive 
from different sources, so that user resources are not 
wasted by handling and viewing repetitive and cumulative 
information. Extraction means 17 also can use at least part 
of rfehe community profile and-rtfee user profile for User #1 
(5) to determine whether the informon content is relevant to 
the community of which User #1 is a part. 

Filter means 21 adaptively filters raw informons 19 and 
produces proposed informons 23 which are conveyed to User #1 
(5) by communication means 25. A proposed informon is a 
selected raw informon that, ^fea^es upon the respective member 
client and community profiles, is predicted to be of 
particular interest to a member client of User 5. Filter 
means 21 can include a plurality of community filters 27a, b 
and a plurality of member client filters 28a-e, each 
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respectively having community and member client profiles. 

When raw informons 19 are filtered by filter means 21, those 

informons that are predicted to be suitable for a particular 

member client of a particular community, e.g., User #1 (5), 

responsive to the respective community and member client 

profiles, are conveyed thereto. Where such is desired, 
* 

filter means 21 also can include a credibility filter -3-5- 
which enables means 21 to perform credibility filtering of 
raw informons 19 according to a credibility profile. 

It is preferred that the adaptive filtering performed 
within filter means 21 by the plurality of filters 27a, b, 
28a-e, and 35, useless* self-optimizing adaptive filtering so 
that each of the parameters processed by filters 27a, b, 28a- 
e, and 35, is driven continually to respective values 
corresponding to a minimal error for each individual 
parameter.^ Self -optimization encourages a dynamic, 
marketplace-like operation of the system, in that those 
entities having the most desirable value, e.g., highest 
credibility, lowest predicted error, etc., are favored to 
prevail . 

Self-optimization can be effected according to 
respective preselected self -optimizing adaptation techniques 
including, for example, one or more of a top-key-word- 
selection adaptation technique, a nearest-neighbor 
adaptation technique, a term-weighting adaptation technique, 
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a probabilistic adaptation technique, and a neural network 
learning technique. In one present embodiment of the 
invention, the term-weighting adaptation technique is 
preferred to be a TF-IDF technique and the probabilistic 
adaptation technique is preferred to be a MDL technique. 

When user 5 receives proposed informon 23 from 
apparatus 1, user 5 is provided with multiple feedback 
queries along with the proposed informon. By answering, 
user 5 creates a feedback profile that corresponds to 
feedback response 29. User feedback response 29 can be 
active feedback, passive feedback, or a combination. Active 
feedback can include the user's numerical rating for an 
informon, hints, and indices. Hints can include like or 
dislike of an author, and informon source and timeliness. 
Indices can include credibility, agreement with conyent or 
author, humor, or value. Feedback response 29 provides an 
actual response to proposed informon 23, which is a measure 
of the relevance of the proposed informon to the information 
need of user 5 . Such relevance feedback attempts to improve 
the performance for a particular profile by modifying the 
profiles, based on feedback response 29. 



^Tho predicted response anticipated by adaptive 
filtering means 21 can be compared to the actual feedback 
response 2 9 of user 5 by first adaptation means 30, which 
derives a prediction error. First adaptation means 3 0 also 
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can include prediction means 33, which collects a number of 
temporally-spaced feedback responses, to update the adaptive 
collaboration profile, the adaptive content profile, or 
both, with an adapted future prediction 34, in order to 

5 minimize subsequent prediction errors by the respective 

adaptive collaboration filter and adaptive content filter. 

In one embodiment of the invention herein, it is 
preferred that prediction means 33 be a self-optimizing 

t ? ; : prediction means using a preselected learning technique, 

l^** Such techniques can include, for example, one or more of a 

;^ top-key-word-selection learning technique, a nearest- 

f neighbor learning technique, a term-weighting learning 

*£] ■■■ 

* technique, and a probabilistic learning technique. First 

11! adaptation means 30 also can include a neural network 

therein and employ a neural network learning technique for 
§h adaptation and prediction. In one present embodiment of the 

invention, the term-weighting learning technique is 
preferred to be a TF-IDF technique and the probabilistic 
learning technique is preferred to be a MDL learning 
2 0 technique . 

First adaptation means 30 further can include second 
adaptation means 3 2 for adapting at least one of the 
adaptive collaboration profiles, the adaptive content 
profiles, the community profile, and the user profile, 
25 responsive to at least one of the other profiles. In this 
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manner, trends attributable to individual member clients, 
individual users, and individual communities in one domain 
of system 16 can be recognized by, and influence, similar 



entities in other domains,, contained within system 16 to the 
extent that the respective entities share common attributes. 

Apparatus 1 also can include a computer storage means 
31 for storing the profiles, including the adaptive 



^ eollaborative profile and the adaptive collaboration 
profile. Additional trend- tracking information can be 
stored for later retrieval in storage means 31, or may be 
conveyed to network 3 for remote analysis, for example, by 
User #2 (7) . 

Figure 2 illustrates another preferred embodiment of 
information filtering apparatus 50, in computer system 51. 
Apparatus 50 can include first processor 52, second 
processors 53a, b, third processors 64a-d, and a fourth 
processor 55, to effect the desired information filtering. 
First processor 52 can be coupled to, and receive a data 
stream 56 from, network 57. First processor 52 can serve as 
a pre-processor by extracting raw informons 58 from data 
stream 56 responsive to preprocessing profile 49 and 
conveying informons 58 to second processors 53a, b. 

Because of the inconsistencies presented by the nearly- 
infinite individual differences in the modes of 
conceptualization, expression, and vocabulary among users, 
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even within a community of coinciding interests, similar 
notions can be described with vastly different terms and 
connotations, greatly complicating informon 
characterization. Mode variations can be even greater 
between disparate communities, discouraging interaction and 
knowledge-sharing among communities. Therefore, it is 
particularly preferred that processor 52 create a mode- 
invariant representation for each raw informon, thus 
allowing fast, accurate informon characterization and 
collaborative filtering. Mode- invariant representations 
tend to facilitate relevant informon selection and 
distribution within and among communities, thereby promoting 
knowledge- sharing, thereby benefitting the group of 
interlinked communities, i.e., a society, as well. 

First processor 52 also can be used to prevent 
duplicate informons, e.g., the same information from 
different sources, from further penetrating, and thus 
consuming the resources of, the filtering process. Other 
processors 53, a, b, 54a-d, also may be used to perform the 
duplicate information elimination function, but additionally 
may measure the differences between the existing informon 
and new informons. That difference between the content of 
the informon the previous time the user reviewed it and the 
content of the informon in its present form is the "delta" 
of interest. Processors 53a, b, 54a-d may eliminate the 
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informon from further processing, or direct the new, altered 
informon to the member client, in the event that nature or 
extent of the change exceeds a "delta" threshold. In 
general, from the notion of exceeding a preselected delta 
threshold, one may infer that the informon has changed to 
the extent that the change is interesting to the user. The 
nature of this change can be shared among all of a user's 
member clients. This delta threshold can be preselected by 
the user, or by the preselected learning technique. Such 
processing, or "delta learning" can be accomplished by 
second processor 53a, b, alone or in concert with third 
processor 54a-d. Indeed, third processor 54a-d can be the 
locus for delta learning, where processors 54a-d adapts a 
delta learning profile for each member client of the 
community, i.e. user, thus anticipating those changes in 
existing informons that the user may find "interesting." 

Second processors 53a, b can filter raw informons 58 and 
extract proposed community informons 59a, b therefrom. 
Informons 59a, b are those predicted by processors 53a, b to be 

relevant to the respect ive ^communit - y , in response to a 

pitAltS are 
c ommun i ty ^ - pr o file 48a, b that^-is unique to each of the 

communities. Although only two second processors 53a, b are 

shown in Figure 2, system 51 can be scaled to support many 

more processors, and communities. It is presently preferred 

that second processor^ 53a, b extract community informons 



59a, b using a two-step process. Where processor 52 has 

generated mode- invariant concept representations of the raw 

informons, processor 53a, b can perform concept-based 

•til-ben/* 

indexing, and then provide detailed community prof idling * of 

A 

each inf ormon . 

J) Third processors 54a-d can receive community inf ormons 
59a, b from processors 53a, b, and extract proposed member 
client informons 61a-d therefrom, responsive to unique 
member client profiles 62a-d for respective ones of member 
clients 63a-d. Each user can be represented by multiple 
member clients in multiple communities. For example, each 
of users 64a ,b can maintain interests in each of the 
communities serviced by respective second processors 53a, b, 
and each receive separate member client informons 61b, c and 
61a, d, respectively. 

Each member client 63a-d provides respective member 
client feedback - p ro files 65a-d to fourth processor 55, 
responsive to the proposed member client informons 61a-d. 
Based upon the member client feedback profil es 65a-d, 
processor 55 updates at least one of the preprocessing 
profile 49, community profiles 48a, b and member client 
profiles 62a-d, -if esponsive to the member client feedback 
-p rofiles G5a-d . Also, processor 55 adapts at least one of 
the adaptive content profile 68 and the adaptive 
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1G^ a top-key-word-selection adaptation technique, a nearest- 

* s neighbor adaptation technique, a term-weighting adaptation 

K- 

! # technique, and a probabilistic adaptation technique. 

k. 'jp Svpparatua- DO al^o may include a neural network as ono or 

H A 

.more of ad ap tive IllLer 66a- d. In one present embodiment of 
l^j-i the invention, the term-weighting adaptation technique is 

Ijjji,: preferred to be a TF-IDF technique and the probabilistic 

adaptation technique is preferred to be a MDL technique. 

An artisan would recognize that one or more of the 
processors 52-55 could be combined functionally so that the 

-the 50 

20 fa actual number of processors used in^ apparatus^ could be less 

than, or greater than, that illustrated in Figure 2. For 
example, in one embodiment of the present invention, first 
processor 52 can be in a single microcomputer workstation, 
G> with processors 53-55 being implemented in additional^ ' 

25 microcomputer systems. Suitable microcomputer systems can 
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collaboration profile 69, responsive to profiles 49, 48a, b, 
and 62a-d. 

Fourth processor 55 can include a plurality of adaptive 
filters 66a-d for each of the aforementioned profiles and 
computer storage therefor. It is preferred that the 
plurality of adaptive filters 66a-d be self -optimizing 
adaptive filters. Self -optimization can be effected 
"J) according to a - r es p e ctiv e preselected self -optimizing 

adaptation technique including, for example, one or more of 
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include those based upon the Intel® Pentium-Pro™ 
microprocessor. In fact, the flexibility of design 
presented by the invention allows for extensive scalability 
of apparatus 50, in which the number of users, and the 
communities supported may be easily expanded by adding 
suitable processors. As described in the context of Figure 
1, the interrelation of the several adaptive profiles and 
respective filters allow trends attributable to individual 
member clients, individual users, and individual communities 
in one domain of system 51 to be recognized by, and 
influence, similar entities in other domains, of system 51 
to the extent that the respective entities in the different 
domains share common attributes . 

y\ The invention herein also comprehends a' method * 100 for 



information filtering in a computer system, as illustrated 
in Figure 3, which includes providing a dynamic informon 
characterization (step 105) having a plurality of profiles 
encoded therein, including an adaptive content profile and 
an adaptive collaboration profile; and adaptively filtering 
the raw informons (step 110) responsive to the dynamic 
informon characterization, thereby producing a proposed 
informon. The method continues by presenting the proposed 
informon to the user (step 115) and receiving a feedback 
profile from the user (step 120) , responsive to the proposed 
informon. Also, the method includes adapting at least one 
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of the adaptive content profile (step 125) and the adaptive 

collaboration profile responsive to the feedback profile; 

and updating the dynamic informon characterization (step 

13 0) responsive thereto. 

The adaptive filtering (step 110) in method 100 can be 

^^tributed adaptive filtering that includes community 

L filtering f-s-fee^l35), - producing a community profile for each 

^ community, and client filtering ^ct e p 140) , similarly 

p^od^cing a member client profile for each member client of 
A in Sobsteps 



^ each community. It is preferred that the f iltering ^-at ctopo 
135 and 140 be responsive to the adaptive content profile 
and the adaptive collaboration profile. Method 100 
^ comprehends servicing multiple communities and multiple 

N ■■ 

ilj users. In turn, each user may be represented by multiple 

igi member clients, with each client having a unique member 

gg,- client profile and being a member of a selected community. 

It is preferred that updating the dynamic informon 
characterization (step 13 0) f urthe y nc - lud es predicting 
selected subsequent member client responses (step 150) . 
20 Method 100 can also include credibility filtering (step 

155) of the raw informons responsive to an adaptive 



credibility profile and updating the credibility profile 

user 

(step 160) responsive to the A feedback profile. Method 100 
further can include creating a consumer profile (step 165) 
25 d> responsive to the feedback profile. In general, the 
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consumer profile is representative of predetermined consumer 
preference criteria relative to the communities of which the 
user is a member client. Furthermore, grouping selected 
ones (step 170) of the users into a preference cohort, 
responsive to the preselected consumer preference criteria, 
can facilitate providing a targeted informon (step 175) , 
such as an advertisement, to the preference cohort. 

Figure 4^ -ke scribes yet another preferred -embodiment of 
3 method 2 00, -ae e o rding to the invention her e in . In general, 
l(g method 2 00 includes partitioning (step 205) each user into 

multiple member clients, each having a unique member client 
profile with multiple client attributes and grouping member 
^> clients (step 210) to f orm ^ multiple communities with each 
^ member client in a particular community sharing selected 

l§y client attributes with other member clients, thereby 

providing each community with a unique community profile 
having common client attributes. 

Method 200 continues by predicting a community profile 
(step 215) for each community using first prediction 
criteria, and predicting a member client profile (step 220) 
for a member client in a particular community using second 
prediction criteria. Method 200 also includes the steps of 

a 

extracting raw informons (step 22 5) fronyfefee data stream and 
selecting proposed informons (step 230) from raw informons. 
2 5 The proposed informons generally are correlated with one or 
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more of the common client attributes of a community, and of 
the member client attributes of the particular member client 
to whom the proposed informon is offered. After providing 
the proposed informons to the user (step 235), receiving 
5 user feedback (step 240) in response to the proposed 

informons permits the updating of the first and second 
prediction criteria (step 245) responsive to the user 
feedback. 

Q : - Method 2 00 further may include prefiltering the data 

ift- stream (step 250) using the predicted community profile, 

with the predicted community profile identifying the raw 
V informons in the data stream. 

^ Step 230 of selecting proposed informons can include 

filtering the raw informons using an adaptive content filter 

*. 

1! H: (step 255) responsive to the informon content; filtering the 

^fl : raw informons using an adaptive collaboration filter (step 

260) responsive to the common client attributes for the 
J> - roopoattivo - community; and filtering the raw informons using 
an adaptive member client filter (step 265) responsive to 
2 0 the unique member client profile. 

It is preferred that updating the first and, second 
prediction criteria (step 245) employ^ a self -optimizing 
adaptation technique, including, for example, one or more of 
a top-key-word-selection adaptation technique, a nearest- 
2 5 neighbor adaptation technique, a term-weighting adaptation 
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technique, and a probabilistic adaptation technique. It is 
further preferred that the term-weighting adaptation 
technique be a TF-IDF technique and the probabilistic 
adaptation technique be a minimum description length 
technique . 

^0^> fia a most preferred embodiment, illustrated in Figure 

5, the information filtering method according to the present 

i - nvonti - on provides rapid, efficient data reduction and 

routing, or * filtering, to the appropriate member client. 

The method 3 00 includes parsing the data stream into tokens 

(step 3 01) ; creating a mode- invariant (MI) profile of the 

informon (step 305); selecting the most appropriate 

communities for each informon, based on the MI profile, 

using concept-based indexing (step 310); detailed analysis 

(step 315) of each informon with regard to its fit within 

each community; eliminating poor-fitting informons (step 
pi / , 

320); detailed - profiling of each informon relative to fit 
A 

for each member client (step 325); eliminating poor-fitting 
informons (step 33 0) ; presenting the informon to the member 
client/user (step 335); and obtaining the member client/user 
response, including multiple ratings for different facets of 
the user's response to the informon (step 340). 
^ ^e—fchre - pr e s ent— i nvention -, it is preferred that coherent 
portions of the data stream, i.e., potential raw informons, 
be first parsed (step 301) into generalized words, called 
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tokens. Tokens include punctuation and other specialized 
symbols that may be part of the structure found in the 
article headers. For example, in addition to typical words 
such as "seminar" counting as tokens, the punctuation mark 
" $" and the symbol "Newsgroup : comp . ai " are also tokens. 
Using noun phrases as tokens also can be useful. 

Next a vector of token counts for the document is 
created. This vector is the size of the total vocabulary, 
with zeros for tokens not occurring in the document. Using 
lljSC this type of vector is sometimes called the bag-of-words 

model. While the bag-of-words model does not capture the 
order of the tokens in the document, which may be needed for 
linguistic or syntactic analysis, - that it captures most of 
^J) the information needed for filtering purposes - can b e 
lSy 3 - assumed -. 
gjL Although, it is common in information retrieval systems 

to group the tokens together by their common linguistic 
roots, called stemming, as a next step it is preferred in 

be 

the present invention that the tokens «fe left m their 
2 0 unstemmed form. In this form, the tokens are amenable to 

being classified into mode- invariant concept components. 
Creating a mode- invariant profile (step 305) , C, 

includes creating a conceptual representation for each 

informon, A, that is invariant with respect to the form-of- 
25 expression, e.g., vocabulary and conceptualization. Each 
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community can consist of a "Meta-U-Zine " collection, M, of 
inf ormons . Based upon profile C, the appropriate 
communities, if any, for each informon in the data stream 
are selected by concept-based indexing (step 310) into each 
M. That is, for each concept C that describes A, put A into 
a queue Q M , for each M which is related to C . It is 
preferred that there is a list of Ms that is stored for each 
concept and that can be easily index-searched. Each A that 
is determined to be a poor fit for a particular M is 
eliminated from further processing. Once A has been matched 
with a particular M, a more complex community profile P M is 
developed and maintained for each M (step 315) . If A has 
fallen into Q M , then A is analyzed to determine whether it 
matches P M strongly enough to be retained or "weeded" out 
(step 325) at this stage. 

Each A for a particular M is sent to each user's 
personal agent, or member client U of M, for additional 
analysis based on the member client's profile (step 325) . 
Each A that fits U's interests sufficiently is selected for 
U's personal informon, or "U-Zine, " collection, Z. Poor- 
fitting inf ormons are eliminated from placement in Z (step 
330) . This user-level stage of analysis and selection may 
be performed on a centralized server site or on the user's 
computer . 
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Next, the proposed informons are presented to user U 
(step 33 5) for review. User U reads and rates each selected 
A found in Z (step 340) . The feedback from U can consist of 
a rating for how "interesting" U found A to be, as well as 
one or more of the following: 

Opinion feedback : Did U agree, disagree, or have no 
opinion regarding the position of A? 
Credibility Feedback : Did U find the facts, logic, 
sources, and quotes in A to be truthful and credible or 
±M; not? 

* = Informon Qualities : How does the user rate the 

C P ; informons qualities, for example, " interestingness , " 

* credibility, funniness, content value, writing quality, 

K- 

ffn violence content, sexual content, profanity level, 

Pi-"; 
**** " 

Up business importance, scientific merit, 

IS:; surprise /unexpectedness of information content, 

artistic quality, dramatic appeal, entertainment value, 
trendiness/ importance to future directions, and opinion 
agreement . 

2 0 Specific Reason Feedback : Why did the user like or 

dislike A? 

Because of the authority? 
Because of the source? 

Because A is out-of-date (e.g. weather report from 
2 5 3 weeks ago) ? 
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Because the information contained in A has been 
seen already? (I.e., the problem of duplicate 
information delivery) 

Categorization Feedback : Did U liked A? Was it placed 

within the correct M and Z? 
Such multi-faceted feedback queries can produce rich 
feedback profiles from U that can be used to adapt each of 
the profiles used in the filtering process to some optimal 
operating point. 

One embodiment of creating a MI profile (step 305) for 
each concept can include concept profiling, creation, and 
optimization. Broad descriptors can be used to create a 
substantially-invariant . concept profile, ideally without the 
word choice used to express concept C. A concept profile 
can include positive concept clues (PCC) and negative 
concept clues (NCC) . The PCC and NCC can be combined by a 
processor to create a measure-of -f it that can be compared to 
a predetermined threshold. If the combined effect of the 
PCC and NCC exceeds the predetermined threshold, then 
informon A can be assumed to be related to concept C; 
otherwise it is eliminated from further processing. PCC is 
a set of words, phrases, and other features, such as the 
source or the author, each with an associated weight, that 
tend to be in A which contains C. In contrast, NCC is a set 
of words, phrases, and other features, such as the source or 
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the author, each with. an associated weight that tend to make 
it more unlikely that A is contained in C. For example, if 
the term "car" is in A, then it is likely to be about 
automobiles. However, if the phrase "bumper car" also is in 

5 A, then it is more likely that A related to amusement parks. 

Therefore, "bumper car" would fall into the profile of 
negative concept clues for the concept "automobile." 

Typically, concept profile C can be created by one or 

X more means. First, C can be explicitly created by user U. 

l(jg: Second, C can be created by an electronic thesaurus or 

^ similar device that can catalog and select from a set of 

concepts and the words that can be associated with that 

& 

s concept. Third, C can be created by using co-occurrence 

■ 

llj, information that can be generated by analyzing the content 

B- 

l|y; of an. informon. This means uses the fact that related 

2*- 

^ features of a concept tend to occur more often within the 

same document than in general. Fourth, C can be created by 
the analysis of collections, H, of A that have been rated by 
one or more U. Combinations of features that tend to occur 

2 0 repeatedly in H can be grouped together as PCC for the 

analysis of a new concept. Also, an A that one or more U 
have rated and determined not to be within a particular Z 
can be used for the extraction of NCC. 

Concept profiles can be optimized or learned 

25 continually after their creation, with the objective that 
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nearly all As that Us have found interesting, and belonging 
in M, should pass the predetermined threshold of at least 
one C that can serve as an index into M. Another objective 
of concept profile management is that, for each A that does 
not fall into any of the one or more M that are indexed by 
C, the breadth of C is adjusted to preserve the first 
objective, insofar as possible. For example, if C's 
threshold is . oxcobd for a given A, C's breadth can be 
narrowed by reducing PCC, increasing NCC, or both, or by 
increasing the threshold for C. 

In the next stage of filtering, one embodiment of 
content-based indexing takes an A that has been processed 
into the set of C that describe it, and determine which M 
should accept the article for subsequent filtering, for 
example, detailed indexing of incoming A. It is preferred 
that a data structure including a database be used, so that 
the vector of Ms, that are related to any concept C, may be 
looked-up. Furthermore, when a Z is created by U, the 
concept clues given by U to the information filter can be 
used to determine a set of likely concepts C that describe 
what U is seeking. For example, if U types in "basketball" 
as a likely word in the associated Z, then all concepts that 
have a high positive weight for the word "basketball" are 
associated with the new Z. If no such concepts C seem to 
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pre-exist, an entirely new concept C is created that is 
endowed with the clues U has given as the starting profile. 

To augment the effectiveness of concept-based indexing, 
it is preferred to provide continual optimization learning. 
In general, when a concept C no longer uniquely triggers any 
documents that have been classified and liked by member 
clients U in a particular community M, then that M is 
removed from the list of M indexed into by C. Also, when 
there appears to be significant overlap between articles 
fitting concept C, and articles that have been classified 
by users as belonging to M, and if C does not currently 
index into M, then M can be added to the list of M indexed 
into by C. The foregoing heuristic for expanding the 
concepts C that are covered by M, can potentially make M too 
broad and, thus, accept too many articles. Therefore, it 
further is preferred that a reasonable but arbitrary limit 
is set on the conceptual size covered by M. 

With regard to the detailed analysis of each informon A 
with respect to the community profile for each M, each A 
must pass through this analysis for each U subscribing to a 
particular M, i.e., for each member client in a particular 
community. After A has passed that stage, it is then 
filtered at a more personal, member client level for each of 
those users. The profile and filtering process are very 
similar for both the community level and the member client 
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level, except that at the community level, the empirical 
data obtained is for all U who subscribed to M, and not 
merely an individual U. Other information about the 
individual U can be used to help the filter, such as what U 

5 thinks of what a particular author writes in other Zs that 

the user reads, and articles that can't be used for the 
group-level M processing. 

Figure 6 illustrates the development of a profile, and 

^ its associated predictors. Typically, regarding the 

1&2 structure of a profile 400, the information input into the 

^ structure can be divided into three broad categories: (1) 

Structured Feature Information (SFI) 405; (2) Unstructured 

s = Feature Information (UFI) 410; and (3) Collaborative Input 

pj- (CI) 415. Features derived from combinations of these three 

15ij' types act as additional peer- level inputs for the next level 

ffl. T of the rating prediction function, called (4) Correlated- 

Feature, Error-Correction Units (CFECU) 420. From inputs 
405, 410, 415, 420, learning functions 425a-d can be applied 
to get two computed functions 426a-d, 428a-d of the inputs. 
2 0 These two functions are the Independent Rating Predictors 

(IRP) 426a-d, and the associated Uncertainty Predictors (UP) 
42 8a-d. IRPs 42 6a-d can be weighted by dividing them by 
their respective UPs 428a-d, so that the more certain an IRP 
426a-d is, the higher its weight. Each weighted IRP 429a-d 
25 is brought together with other IRPs 429a-d in a combination 
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function 427a-d. This, combination function 427a-d can be 
from a simple, weighted, additive function to a far more 
complex neural network function. The results from this are 
normalized by the total uncertainty across all UPs, from 
Certain = zero to Uncertain = infinity, and combined using 
the Certainty Weighting Function (CWF) 430. Once the CWF 
43 0 has combined the IRPs 426a-d, it is preferred that 
result 432 be shaped via a monotonically increasing 
function, to map to the range and distribution of the actual 
ratings. This function is called the Complete Rating 
Predictor (CRP) 432. 

SFI 405 can include vectors of authors, sources, and 
other features of informon A that may be influential in 
determining the degree to which A falls into the categories 
in a given M. UFI 410 can include vectors of important 
words, phrases, and concepts that help to determine the 
degree to which A falls into a given M. Vectors can exist 
for different canonical parts of A. For example, individual 
vectors may be provided for subject /headings, content body, 
related information in other referenced informons, and the 
like. It is preferred that a positive and negative vector 
exists for each canonical part. 

CI 415 is received from other Us who already have seen 
A and have rated it. The input used for CI 415 can include, 
for example, " interestingness , " credibility, funniness, 
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content value, writing quality, violence content, sexual 
content, profanity level, business importance, scientific 
merit, surprise/unexpectedness of information content, 
artistic quality, dramatic appeal, entertainment value, 
trendiness/ importance to future directions, and opinion 
agreement. Each CFECU 420 is a unit that can detect sets of 
specific feature combinations which are exceptions in 
combination. For example, author X's articles are generally 
disliked in. the Z for woodworking, except when X writes 
1(^P about lathes. When an informon authored by X contains the 

;P concept of "lathes," then the appropriate CFECU 420 is 

S J triggered to signal that this is an exception, and 

accordingly a signal is sent to offset the general negative 
signal otherwise triggered because of the general dislike 



lly for X's informons in the woodworking Z, 



f¥j; (i) ^Iffi oxos a pxagy o f the form of Structured Feature 

Information (SFI) 405 can include fields such as Author, 
Source, Information-Type, and other fields previously 
identified to be of particular value in the analysis. For 

20 simplicity, the exemplary SFI, below, accounts only for the 

Author field. For this example, assume three authors A, B, 
and C, have collectively submitted 10 articles that have 
^ been read, and have been rated as in TABLE 1^. In the ■ 
accompanying rating scheme, a rating can vary between 1 and 

25 5, with 5 indicating a "most interesting" article. If four 
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new articles (11, 12, 13, 14) arrive that have not yet been 

rated, and, in addition to authors A, B, C, and a new author 

D has contributed, a simple IRP for the Author field, that 

just takes sums of the averages, would be as follows: 

IRP (author) = weighted sum of 

average (ratings given the author so far) 
average (ratings given the author so far in this M) 
average (ratings given all authors so far in this M) 
average (ratings given all authors) 

average (ratings given the author so far by a particular 
user U) # 

average (ratings given the author so far in this M by a 

particular user U)* 
average (ratings given all authors so far in this M by a 

particular user U)* 
average (ratings given all authors by a particular 

user) * 



(if for a personal Z) 



The purpose of the weighted sum is to make use of broader, 

more general statistics, when strong statistics for a 

particular user reading an informon by a particular author, 

within a particular Z may not yet be available. When 

stronger statistics are available, the broader terms can be 

eliminated by using smaller weights. This weighting scheme 

is similar to that used for creating CWFs -tfH), for the 

A 

profiles as a whole. Some of the averages may be left out in 
the actual storage of the profile if, for example, an 
author's average rating for a particular M is not 
"significantly" different from the average for the author 
across all Ms. Here, "significance" is used is in a 
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statistical sense, and frameworks such as the Minimum 

Description Length (MDL) Principle can be used to determine 

when to store or use a more "local" component of the IRP. 

As a simple example, the following IRP employs only two of 

the above terms : 

IRP (author) = weighted sum of 

average (ratings given this author so far in this M) 
average (ratings given all authors so far in this M) 

Table 2 gives the values attained for the four new articles. 

icertainty Predictions (UP) 428a-l can be handled 

according to the underlying data distribution assumptions. 

It is generally important to the uncertainty prediction that 

it should app!s£>ach zero (0) as the IRP 426a-d become an 

exact prediction, and should approach infinity when there is 

no knowledge available to determine the value of an IRP. As 



an example, the vari 
the UP. As recognized 



the rating can be estimated as 
skilled artisan, combining the 
variances from the components of the IRP can be done using 
several other methods as we Pi, depending upon the 
theoretical assumptions used at^d the computational 
efficiency desired. In the present example, shown in Table 
3, the minimum of the variances of\he components can be 
used. In the alternative, the UP 428k-l can be realized by: 



UP ait 



VARI VAR2 
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An example of Unstructured Feature Information (UFI) 
410 cVn include entities such as text body, video/ image 
captions, song lyrics, subject/ titles , reviews /annotations, 
and imagW audio-extracted features, and the like. Using an 
exemplary \ntity of a text body, a sample of ten (10! 
articles th^: each have some number of 4 words, or tokens, 
contained theisewithin are listed in TABLE 4. As before, a 
rating can be f\om 1 to 5, with a rating of 5 indicating 
"most interesting" This vector can be any weighting scheme 
for tokens that all\) ws_ j or comparison between a group of 
collected documents, ^r /informons, and a document, or 
informon, under quest 

As previously mei^tl^ed, positive and negative vectors 
can provide a weighted avenge of the informons, according 
to their rating by user U. l^e weighting scheme can be 
based on empirical observation^ of those informons that 
produce minimal error through an\optimization process 
Continuing in the example, weighting values for the positive 
can be: 



Rating 
Weight 



5 
1.0 



4 

0.9 



3 
0.4 



2 

0.1 



1 
0.0 



Similarly, the negative vector can use a Weighting scheme in 
the opposite "direction" : 



Rating 5 
Weight 0.0 



4 

0.1 



3 

0.4 



2\ 

0 . 



1 

1.0 



Usirra a TF-IDF scheme, the following token vectors can be 
obtained: 



Positive 
Negative 



Foken 1 

71 
0\3 0 



Token 2 

0.56 

0.43 



Token 3 

0.33 

0.60 



Token 4 
0.0 
0.83 



In the case whei^e four new documents come in to the 
information filte\, the documents are then compared with the 
profile vector. 

For the purposes\of the example herein, only the TF-IDF 



representation and the 
normalized dot product, 
the occurrences of each 
illustrates the correspe 



similarity metric, i.e., the 
be used. TABLE 5 illustrates 
ex£\ipLkry token. TABLE 6 
similarity vector 
representations using a TF-IDF \cheme . The similarity 
measure produces a result betweemO . 0-1 . 0 that is preferred 
to be remapped to an IRP. This remapping function could be 
as simple as a linear regression, or>a one-node neural net. 
Here, a simple linear transformation isv used, where 



IRP(pos) = 1 + (SIM(pos)) 



and 



IRP(neg) = 5 - (SIMipos) ) x 4 



TABLE 7 illustrates both IRP(pos) and IRP(neg), along with 
respective positive and negative squared-error, usYng the 14 



articles, informons, read and rated thus far in the 
ongoing exaprt&les • 



m 

H*- 

Si 

M = 

0; 

UK 
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It is preferred that an estimate of the uncertainty 
resulting from a positive or negative IRP be made, and a 
complex neural net approach could be used. However, a 
simpler method, useful for this example, is simply to repeat 
the same process that was used for the IRP but, instead of 
predicting the rating, it is preferred to predict the 
squared-error , given the feature vector. The exact square- 
error values can be used as the informon weights, instead of 
using a rating-weight lookup table. A more optimal mapping 
function could also be computed, if indicated by the 
application . 

Token 1 Token 2 Token 3 Token 4 
IRP pos. vector 16.68 8.73 12.89 11.27 

IRP neg. vector 15.20 8.87 4.27 5.04 

The UPs then can be computed in a manner similar to the 
IRP's: comparisons with the actual document vectors can be 
made to get a similarity measure, and then a mapping 
function can be used to get an UP. 

Making effective use of collaborative input (CI) from 
other users U is a difficult problem because of the 
following seven issues. First, there generally is no a 
priori knowledge regarding which users already will have 
rated an informon A, before making a prediction for a user 
U, who hasn't yet read informon A. Therefore, a model for 
prediction must be operational no matter which subset of the 



• ♦ 

64 

inputs happen to be available, if any, at a given time. 
Second, computational efficiency must be maintained in light 
of a potentially very large set of users and inf ormons . 
Third, incremental updates of rating predictions often are 
desired, as more feedback is reported from users regarding 
an informon. Fourth, in learning good models for making 
rating predictions, only very sparse data typically is 
available for each users rating of each document. Thus, a 
large "missing data" problem must be dealt with effectively. 

Fifth, most potential solutions to the CI problem 
require independence assumptions that, when grossly 
violated, give very poor results. As an example of an 
independence assumption violation, assume that ten users of 
a collaborative filtering system, called the "B-Team, " 
always rate all articles exactly in the same way, for 
example, because they think very much alike. Further assume 
that user A's ratings are correlated with the B-Team at the 
0.5 level, and are correlated with user C at the 0.9 level. 
Now, suppose user C reads an article and rates it a "5". 
Based on that C's rating, it is reasonable to predict that 
A's rating also might be a "5". Further, suppose that a 
member of the B-Team reads the article, and rates it a n 2". 
Existing collaborative filtering methods are likely to 
predict that A's rating R A would be: 



65 

R A = (0.9x5+0.5x2)/(0.9+0.5) = 3.93 

In principle, if other members of the B-Team then read and 
rate the article, it should not affect the prediction of A's 
rating, R A , because it is known that other B-Team members 
always rate the article with the same value as the first 
member of the B-Team. However, the prediction for A by 
existing collaborative filtering schemes would tend to give 
10 times the weight to the "2" rating, and would be: 

R A = (0.9 x 5 + 10 x 0.5 x 2)/(0.9 + 10 x 0.5) = 2.46 

Existing collaborative filtering schemes do not work well in 
this case because B-Team' s ratings are not independent, and 
have a correlation among one another of 1. The information 
filter according to the present invention can recognize and 
compensate for such inter-user correlation. 

Sixth, information about the community of people is 
known, other than each user's ratings of inf ormons . This 
information can include the present topics the users like, 
what authors the users like, etc. This information can make 
the system more effective when it is used for learning 
stronger associations between community members. For 
example, because Users A and B in a particular community M 
have never yet read and rated an informon in common, no 
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correlation between their likes and dislikes can be made, 
based on common ratings alone. However, users A and B have 
both read and liked several informons authored by the same 
author, X, although Users A and B each read a distinctly 
different Zs. Such information can be used to make the 
inference that there is a possible relationship between user 
A's interests and user B's interests. For the most part, 
existing collaborative filtering systems can not take 
advantage of this knowledge. 

Seventh, information about the informon under 
consideration also is known, in addition to the ratings 
given it so far. For example, from knowing that informon A 
is about the concept of "gardening", better use can be made 
of which users' ratings are more relevant in the context of 
the information in the informon. If user B's rating agrees 
with user D's rating of articles when the subject is about 
"politics", but B's ratings agree more with user D when 
informon A is about "gardening", then the relationship 
between User B's ratings and User D's ratings are preferred 
to be emphasized to a greater extent than the relationship 
between User B and User C when making predictions about 
informon A. 

With regard to the aforementioned fourth, sixth and 
seventh issues namely, making effective use of sparse, but 
known, information about the community and the informon, it 
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is possible to determine the influence of user A's rating of 
an informon on the predicted rating of the informon for a 
second user, B. For example, where user A and user B have 
read and rated in common a certain number of informons, the 
influence of user A's rating of informon D on the predicted 
rating of informon D for user B can be defined by a 
relationship that has two components. First, there can be a 
common "mindset, " S M , between user A and user B and informon 
D, that may be expressed as: 

M s = profile (A) X profile (B) X DocumentProf ile (D) . 

Second, a correlation may be taken between user A's past 
ratings and user B's past ratings with respect to informons 
that are similar to D. This correlation can be taken. by 
weighting all informons E that A and B have rated in common 
by the similarity of E to D, S^: 

S ED = Weighted_Correlation (ratings (A) , ratings (B) ) 

Each of the examples can be weighted by 



= weight for rating pair (rating (A, D) , rating (B, D) ) 
= DocumentProf ile (E) X DocumentProf ile (D) 



» 
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Note that the "X" in the above equation may not be a mere 

multiplication or cross-product, but rather be a method for 

comparing the similarity between the profiles. Next, the 

similarity of the member client profiles and informon 

content profiles can be compared. A neural network could 

be used to learn how to compare profiles so that the error 

in predicted ratings is minimized. However^j^t simple cosine 

similarity metric ,y \ao wao used oarlior in the di c cu c- sion o£ > 

Unstructured Feature Information (UFI) can be used. 

The method used -a .3 preferred to bo able to includ e more 

A 

than just the tokens, such as the author and other SFI; and, 
it is preferred that the three vectors for component also 
are able to be compared. SFIs may be handled by 
transforming them into an entity that can be treated in a 
comparable way to token frequencies that can be multiplied 
in the standard token frequency comparison method, which 
would be recognized by a skilled artisan. 

Continuing in the ongoing example, the Author field may 
be used. Where user A and user B have rated authors K and 
L, the token frequency vector may appear as follows: 
Avg. Rating 



Given to 



# in 



Avg. Rating 
Given to # in 



Author K sample Author L sample 



Avg. Rating 
Given to # in 
Author M sample 



User 

A 3.1 
B 4 



21 
1 



1.2 
1.3 



5 
7 



N/A 
5 



0 
2 



Further, the author component of the member client profiles 
of user A and user B may be compared by taking a special 
weighted correlation of each author under comparison. In 
general, the weight is a function F of the sample sizes for 
user A's and user B's rating of the author, where F is the 
product of a monotonically-increasing function of the sample 
size for each of user A and user B. Also, a simple function 
G of whether the informon D is by the author or not is used. 
This function can be: G = g if so, and G = p<q if not, 
where p and q are optimized constraints according to the 
domain of the filtering system. When there has been no 
rating of an author by a user, then the function of the zero 
sample size is positive. This is because the fact that the 
user did not read anything by the author can signify -a- some 
indication that the author might not produce an informon 
which would be highly rated by the user. In this case, the 
exact value is an increasing function H of the total 
articles read by a particular user so far, because it 
becomes more likely that the user is intentionally avoiding 
reading informons by that author with each subsequent 
article that has been read but *is not prepared by the 
author. In general, the exact weighting function and 
parameters can be empirically derived rather than 
theoretically derived, and so is chosen by the optimization 
of the overall rating prediction functions. Continuing in 
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the present example, a correlation can be computed with the 

following weights for the authors K, L and M. 

Author Weight 

K F (21,1, not author) 

= log(21 + 1) x log(l + 1) x G(not author) 

= 0.04 

L F(5, 7, author or D) 

= log(5+l) x log (7 + 1) x G(author) 
= 0.70 

M F(0.2, not author) 

= H(26) x log (2 + 1) x G(not author) 
= 0.02 

It is preferred that the logarithm be used as the 
monotonically-increasing function and that p = 1, g = 0.1. 
Also used are H = log (sample_size*0 . 1) and an assumed 
rating, for those authors who are unrated by a user, to the 
value of "2." The correlation for the author SFI can be 
mapped to a non-zero range, so that it can be included in 
the cosine similarity metric. This mapping can be provided 
by a simple one-neuron neural network, or a linear function 
such as, (correlation + 1)*P 0 . Where the P 0 is an optimized 
parameter used to produce the predicted ratings with the 
lowest error in the given domain for filtering. 

An artisan skilled in information retrieval would 
recognize that there are numerous methods that can be used 
to effect informon comparisons, particularly document 
comparisons. One preferred method is to use a TF-IDF 
weighting technique in conjunction with the cosine 
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similarity metric. SFI including author, can be handled by 
including them as another token in the vector. However, the 
token is preferred to be weighted by a factor that is 
empirically optimized rather than using a TF-IDF approach. 
Each component of the relationship between user A's and user 
B's can be combined to produce the function to predict the 
rating of informon D for user B. The combination function 
can be a simple additive function, a product function, or a 
complex function, including, for example, a neural network 
mapping function, depending upon computational efficiency 
constraints encountered in the application. Optimization of 
the combination function can be achieved by minimizing the 
predicted rating error as an objective. 

In addition to determining the relationship between two 
user's ratings, a relationship that can be used and combined 
across a large population of users can be developed. This 
relationship is most susceptible to the aforementioned 
first, second, third, and fifth issues in the effective use 
of collaborative input. Specifically, the difficulty with 
specifying a user rating relationship across a large 
population of users is compounded by the lack of a priori 
knowledge regarding a large volume of dynamically changing 
information that may have unexpected correlation and 
therefore grossly violate independence assumptions. 
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In one embodiment of the present invention, it is 

preferred that users be broken into distributed groups 

called "mindpools . " Mindpools can be purely hierarchical, 

purely parallel, or a combination of both. Mindpools can be 

similar to the aforementioned "community" or may instead be 

one of many subcommunities . These multiple hierarchies can 

be used to represent different qualities of an article. 

Some qualities that can be maintained in separate 

hierarchies include : interestingness ; credibility; 

funniness; valuableness ; writing quality; violence content; 

sexual content; profanity level; business importance; 

scientific merit; artistic quality; dramatic appeal; 

entertainment value; surprise or unexpectedness of 

information content; trendiness or importance to future 

directions; and opinion agreement. Each of these qualities 

can be optionally addressed by users with a rating feedback 

^ mechanism and, therefore, these qualities can be used^ drive 

separate mindpool hierarchies. Also, the qualities can be 

used in combinations, if appropriate, to develop more 

complex composite informon qualities, and more sublime 

mindpools. r 

f referred 

Figure 7 illustrates ^efte- embodiment of a mindpool 



-iS eiS -c hy 500. It is preferred that all users be members of 
the uppermost portion of the hierarchy, namely, the top 
mindpool 501. Mindpool 501 can be broken into sub-mindpools 
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502a-c, which separate users into those having at least some 

common interests. Furthermore, each sub-mindpool 502a-c can 

be respectively broken into sub-sub-mindpools 503a-b, 503c- 

d, 503e,f,g to which users 504a-g are respective members. 
A3 

-Ss* used herein, mindpool 501 is the parent node to sub- 

A 

mindpools 502a-c, and sub-mindpools 502a-c are the 
reS p ec tive parent nodes to sub-sub-mindpools 503a-g. 

Sub* KhuxWbb 

TTinifipnol" ^fl^n r are the child nodes to mindpool 501 and 
^bn^pSoXg 503a-g are child nodes to respective mindpools 
■ ~502g -3-: — Mi - ndpooL s- 503a-g can be considered to be end nodes. 
Users 505a, b can be members of sub-mindpool 502a, 502c, if 
such more closely matches their interests than would 
membership in a sub- sub-mindpool 503a-g. In general, the 
objective is to break down the entire population of users 
into subsets that are optimally similar. For example, the 
set of users who find the same articles about "gardening" by 
author A to be interesting but nevertheless found other 
articles by author A on "gardening" to be uninteresting may 
be joined in one subset. 

A processing means or mindpool manager may be used to 
handle the management of each of the mindpools 501, 502a-c, 
and 503a-g. A mindpool manager performs the following 
functions: (1) receiving rating information from child-node 
mindpool managers and from those users coupled directly to 
the manager; (2) passing rating information or compiled 
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statistics of the rating information up to the manager's 
parent node, if such exists; (3) receiving estimations of 
the mindpool consensus on the rating for an informon from 
the manager's parent mindpool, if such exists; and (4) 
making estimations of the mindpool consensus on the rating 
for a specific informon for the users that come under the 
manager's domain; and (5) passing the estimations from 
function 4 down to either a child-node mindpool or, if the 
manager is .an end node in the hierarchy, to the respective 
user's CWF, for producing the user's predicted rating. 
Function 4 also can include combining the estimations 
received from the manager's parent node, and Uncertainty 
Predictions can be estimated based on sample size, standard 
deviation, etc. Furthermore, as alluded to above, users can 
be allowed to belong to more than one mindpool if they don't 
fit precisely into one mindpool but have multiple views 
regarding the conceptual domain of the informon. Also, it 
is preferred that lateral communication^ btetween peer 
managers who have similar users beneath them to share 
estimation information. When a rating comes in from a user, 
it can be passed to the immediate manager (s) node above that 



whether the rating will effect its current estimation or 
whether the statistics should be passed upward to a parent- 
node. If the manager estimation would change by an amount 



user . 



It is preferred that the manager (s) first decide 
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above an empirically-derived minimum threshold, then the 
manager should pass that estimation down to all of its 
child-nodes. In the event that the compiled statistics are 
changed by more than another minimum threshold amount, then 
the compiled statistics should be passed to the manager's 
parent-node, if any , and the process recurses upward and 
downward in the hierarchy. 

Because no mindpool manager is required to have 
accurate information, but just an estimation of the rating 
llflT; and an uncertainty level, any manager may respond with a 
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simple average of all previous documents, and with a higher 

!«# • 

degree of uncertainty, if none of its child-nodes has any 
s rating information yet. . The preferred distributed strategy 

fy tends to reduce the communication needed between processors, 

Q> 

ligj-: and the computation tends to be pooled, thereby eliminating 

43, 

p*?: a substantial degree of redundancy. Using this distributed 

■- 

strategy, the estimations tend to settle to the extent that 

the updating of other nodes, and the other users predictions 

are minimized. Therefore, as the number of informons and 

2 0 users becomes large, the computation and prediction updates 

grow as the sum of the number of informons and the number of 

users, rather than the product of the number of informons 

and the number of users. In addition, incremental updates 

can be accomplished by the passing of estimations up and 
-the. 

2 5 @> down hierarchy. Incremental updates of rating predictions 

A 
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continue to move until the prediction becomes stable due to 
the large sample size. The distributed division of users 
can reduce the effects of independent assumption violations. 
In the previous example with the B-Team of ten users, the B- 
5 Team can be organized as a particular mindpool . With the 

additional ratings from each of the B-Team members, the 
estimation from the B-Team mindpool typically does not 
change significantly because of the exact correlation 
5 2.*' between the members of that mindpool. This single 

lJ4' ; estimation then can be combined with other estimations to 

T' : achieve the desired result, regardless of how many B-Team 

* members have read the article at any given time. 

« The mindpool hierarchies can be created by either 

computer- or human-guided methods. If the hierarchy 



t*3s ■. 



l|j ; creation is human-guided, there often is a natural breakdown 

of people based on information such as job position, common 



interests, or any other information that is known about 
them. Where the mindpool hierarchy is created 
automatically, because the previously described measure of 

2 0 the collaborative input relationship between users can be 

employed in a standard hierarchical clustering algorithm to 
produce each group of users or nodes in the mindpool 
hierarchy. Such standard hierarchical clustering algorithms 
can include, for example, the agglomerative method, or the 

25 divide-and-conquer method. A skilled artisan would 



65 





77 



recognize that many other techniques also are available for 
incrementally-adjusting the clusters "as new information is 
collected. Typically, clustering is intended to (1) bring 
together users whose rating information is clearly not 
independent; and (2) produce mindpool estimations that are 
substantially independent among one another. 

Estimations are made in a manner similar to other 
estimations described herein. For example, for each user or 
sub-mindpool (sub- informant) , a similarity between the sub- 
informant and the centroid of the mindpool can be computed 
in order to determine how relevant the sub- informant is in 
computing the estimation. Uncertainty estimators also are 
associated with these sub-inf ormants, so that they can be 
weighted with respect to their reliability in providing the 
most accurate estimation. Optionally, the informon under 
evaluation can be used to modulate the relevancy of a sub- 
informant. This type of evaluation also can take advantage 
of the two previously-determined collaborative information 
relationship components, thereby tending to magnify 
relationships that are stronger for particular types of 
informons than for others. Once a suitable set of weights 
are established for each user within a mindpool for a 
particular informon, a simple weighted-average can be used 
to make the estimation. It is preferred that the "simple" 
weighted average used more conservative regarding input 






information that a simple independent linear regression. 
Also, the overall Uncertainty can be derived from the 
Uncertainty Predictions of the sub-informants, in a manner 
similar to the production of other uncertainty combination 
methods described above. Approximations can be made by pre- 
computing all terms that do not change significantly, based 
on the particular informon, or the subset of actual ratings 
given so far to the mindpool manager. 

As stated previously, the correlated-f eature error- 
correction units (CFECUs) are intended to detect 
irregularities or statistical exceptions. Indeed, two 
objectives of the CFECU units are to (1) find non-linear 
exceptions to the general structure of the three 
aforementioned types of inputs (SFI, UFI, and CI); and (2) 
find particular combinations of informon sub-features that 
statistically stand out as having special structure which is 
not captured by the rest of the general model; and (3) 
trigger an additional signal to the CFECU' s conditions are 
met, in order to reduce prediction error. ■ftn— e xompic off the 



CFECU operation is given preocntly . 
A 

User B's Avg. Rating of 
of Informons About 
Gardening Politics 



Author A' s 
Articles 



4.5 



1.2 



Other Authors 



1.4 



2 
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Weighted 

by Topic 1.68 1 . 87 



IV- 

w 
a 

W 
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User B's number of 
Informons Read About 
Gardening Politics Average over 

Topics 

Author A's 

Articles 7 40 1.69 

Other Authors 70 200 1.84 



In this example, it is desired that author A's informon D 
about gardening have a high predicted rating for user B. 
However, because the average rating for author A by user B 
is only 1.69, and the average rating for the gardening 
concept is only 1.68, a three-part model (SFI-UFI-CI) that 
does not evaluate the informon features in combination would 
tend to not rank informon D very highly. In this case, the 
first CFECU would first find sources of error in past 
examples. This could include using the three-part model 
against the known examples that user B has rated so far. In 
this example, seven articles that user B has rated, have an 
average rating of 4.5, though even the three-part model only 
predicts a rating of about 1.68. When such a large error 
appears, and has statistical strength due to the number of 
examples with the common characteristics of, for example, 
the same author and topic, a CFECU is created to identify 
that this exception to the three-part model has been 
triggered and that a correction signal is needed. Second, 
it is preferred to index the new CFECU into a database so 
that, when triggering features appear in an informon, for 
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example, author and topic, the correction signal is sent 
into the appropriate CWF. One method which can be used to 
effect the first step is a cascade correlation neural 
network, in which the neural net finds new connection neural 
net units to progressively reduce the prediction error. 
Another method is to search through each informon that has 
been rated but whose predicted rating has a high error, and 
storing the informons profile. 

When "enough" informons have been found with high error 
and common characteristics, the common characteristics can 
be joined together as a candidate for a new CFECU. Next, 
the candidate can be tested on all the samples, whether they 
have a high prediction or a low prediction error associated 
with them. Then, the overall error change (reduction or 
increase) for all of the examples can be computed to 
determine if the CFECU should be added to the informon 
profile. If the estimated error reduction is greater than a 
minimum threshold level, the CFECU can be added to the 
profile. As successful CFECU are discovered for users' 
profiles, they also can be added to a database of CFECU 's 
that may be useful for analyzing other profiles. If a 
particular CFECU has a sufficiently broad application, it 
can be moved up in the filtering process, so that it is 
computed for every entity once. Also, the particular CFECU 
can be included in the representation that is computed in 
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the pre-processing stage as a new feature. In general, the 
estimation of the predicted rating from a particular CFECU 
can be made by taking the average of those informons for 
which the CFECU responds. Also, the Uncertainty can be 
chosen such that the CFECU signal optimally outweighs the 
other signals being sent to the CWF. One method of self- 
optimization that can be employed is, for example, the 
gradient descent method, although a skilled artisan would 
recognize that other appropriate optimization methods may be 

USQd. 

ublications mentioned in this specification are 
indicative of the level of skill in the art to which this 
invention p^Ekins. All publications are herein 
incorporated Dereference to the same extent as if each 
individual publication was specifically but individually 
indicated to be incorporated by reference, 
^ ^^rt^crmoro, many alterations and modifications may be 
made by those having ordinary skill in the art without 
departing from the spirit and scope of the invention. 
Therefore, it must be understood that the illustrated 
embodiments have been set forth only for the purposes of 
example, and that it should not be taken as limiting the 
invention as defined by the following claims. The following 
claims are, therefore, to be read to include not only the 
combination of elements which are literally set forth but 
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all equivalent elements for performing substantially the 
same function in substantially the same way to obtain 
substantially the same result. The claims are thus to be 
understood to include what is specifically illustrated and 
described above, what is conceptually equivalent, and also 
what incorporates the essential idea of the invention. 
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